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Iterative methods for elliptic finite element 
equations on general meshes 

R.A- Nicolaides & Shenaz Choudhury 
Department of Mathematics 
Carneg i e-Me 1 1 on University 

1 . Int r oduc 1 1 on 

say that the development of iterative solution 
kinds of discretized partial differential equations 
branch of numerical analysis. Perhaps the greater 
has gone into multigrid algorithms, the next most 
preconditioning methods. Traditionally applied to 
multigrid methods have also recently been 
successfully applied to solving the hyperbolic equations of gas 
dynamics (see [Jaml] for a survey). 

Iterative methods in general have yet to penetrate mainstream 
(elliptic) finite element methodology, where direct solvers are the 
rule. This situation is partly historical, but is also due to the 
difficulty of using multigrid methods in situations of great 
geometrical complexity such as those routinely encountered in 
structural mechanics. It is not easy to automate the construction of 
several increasingly coarse embedded meshes which conform to the 
geometry of an arbitrary domain. Even when this is possible, problems 
with smoothing or other multigrid components may remain. In spite of 
these dif f icul ties progress has been made towards a general 
implementation in [Lohl]. Where three dimensional problems must be 
solved, there does seem to be considerable interest in iterative 
methods, even for routine structural mechanics. The storage and time 
requirements of direct methods for such problems are sufficiently 
large that serious consideration of competing methods is a virtual 
necess i ty . 


It is fair to 
techniques for all 
remains a vigorous 
part of the effort 
common topic being 
elliptic problems. 
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Aside from classical multigrid methods - which we will not 
consider further here - what choices remain? Basically there are two: 
general first or second order recursions with some form of 
preconditioner, and ’’algebraic multigrid”. Both groups have their 
origins in the classical iterative methods. In the first class a 
special role is assigned to the conjugate gradient method, and in the 
second the Gauss-Seidel method is employed. The purpose of this paper 
is to survey some recent developments in this field. Section 2 
introduces notation, conventions used in the analysis of iterative 
methods, indicates criteria for selecting good methods, and defines 
the conjugate gradient method on which many later results depend. 
Section 4 introduces the idea of preconditioning of conjugate 
gradients and defines an important class of preconditioners based on 
approximate Gauss elimination. Section 3 is about ’’algebraic 
multigrid” and its applicability to finite element problems. Section 
5 introduces three newer methods. These are a ’’deflation” method for 
improving the convergence of conjugate gradients, an ’’element by 
element" iterative method, and some recent ideas based on classical 
substructuring, which may have applications to parallel computing. 
Section 6 gives a few computational results illustrating the 
properties of the various methods and also contains some further 
general remarks. 


2 . Pre 1 iminar i es 

The equation to be solved is 

Ku = f (2.1) 

where K is an N x N (symmetric) positive definite matrix. {[Elml] 
contains a survey of iterative methods for nonsymmetric problems.) In 
keeping with the aims of the paper, (2.1) is assumed to come from the 
finite element discretization of an elliptic equation or system. In 
practice, most such problems concern the equation 

div (A grad v) = g (2.2) 

in a bounded domain of IR^ or IR^ with appropriate boundary conditions. 

A denotes a positive definite second or fourth order tensor, 
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frequently piecewise constant and/or highly anisotropic. The 
displacement equations of linear elasticity have the form (2.2) where 
V now denotes the displacement vector and A denotes the elasticity 
tensor. We will usually stay with the scalar case of (2.2) using it 
as a model for the vector case (but see Bercovier’s example in section 
6.2). Concerning the discretization of (2.2) we sha 11 once and for 
all assume that the finite element space chosen uses only function 
values as nodal parameters on the mesh. This is an essential 
restriction for most of the more efficient methods we consider below; 
efforts to circumvent i t tend to involve extra constraints. The 
important point is that efficient methods are not usually designed to 
deal with the general case, and can fail if applied indiscriminately. 

Iterative methods (i.m.) produce a sequence of approximations 

u^^^ to the solution u of (2.1). We define the error = u - u^^^ 

fk) fk) 

and the residual r'‘ ' = f-Ku'^ ' , An important relation is the 
residual equation 


A basic idea of i.m. 
u then we can write 


i s that i f u^^^ 


(2.3) 

is a given approximation to 


u 


u 


(k) 


+ 6 


(k) 


(2.4) 


f k) 

Although ^ is unknown, we can use (2.3) to compute an approximation 
(k) 

e'‘ ^ and then define 


,„(k) ^ Jk) (2.5) 

f k) 

Surprisingly crude choices of ^ lead to convergent iterations. For 
example, even approximating (2.3) by 

(l/a)Ie^^^ = (2.6) 

where a is a carefully chosen scalar and I is the identity matrix 

f k) 

gives a (slowly) convergent algorithm: u'^ u as k This 

me thod , the simplest of all iterative methods was apparently first 
suggested in 1910 [Riel] and i s known as the (1st order) Richardson 
method. Its convergence can be improved by allowing a to vary with k. 
Using (2.6) in (2.5) we get the iteration formula 
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(k+1) (k) ^ (k) 


and from (2.3), (2.4), and (2.7) the error equation 


. (I - «K)e<'') 


f k) 

The Jacobi method defines e ' by 




(2.7) 


( 2 . 8 ) 


(2.9) 


where D denotes the diagonal of K. For this, the error equation is 

g^(k+l) = (I _ (2.10) 

In the method of steepest descents introduced by Cauchy in 1847, (2.7) 

is replaced by 

(k+1) (k) ^ (k) 

u'^ ^=u^'+a, r^^ 

k 

for which 

.(''♦D = (I - 

where =s ( r ^ , r ^^ ^ ) / ( r ^ ^ \ Kr ^ ^ ^ ) , the parentheses denoting ordinary 

inner products of vectors. chosen in this way has the property of 

minimizing (u . Ku) -2(u . f ) . the energy functional associated with (2.1), 

(k) 

down the gradient at the point u'^ ^ . 

(k) 

The classical Gauss-Seidel method defines ' as 

= (l/aj^j^)ij^iJr^^^ (2.13) 


( 2 . 11 ) 


( 2 . 12 ) 


where the subscripts on the ri gh t are evaluated mod N, and i^^ denotes 

the vector with 1 in position k, and 0 in all other positions. 

Letting k = 1.2 N in turn in (2.13) and (2.5) gives one 

iteration of the Gauss-Seidel method. To generalize this to SOR , the 
second term in (2.13) is multiplied by g). the r e laxat i on factor. 

Each of the methods defined above is a first order i.m. This 
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refers to the fact that as in e.g. (2.12) successive errors (and 
iterates) satisfy a first order difference equation. For these and 
other methods it is often possible to determine a number 0<p<l such 
that 

I < I (2.14) 

where C is a constant, and [. [ denotes some norm on IR^ , Provided 
(2.14) is a sharp inequality, p can serve as a performance measure for 
a method; to be useful it must be complemented by a measure of the 
work needed to compute each iterate. 

In the literature on i.m. it is customary to at least evaluate p 
for the test problem consisting of the standard 5 point approximation 
to the Laplacian on a uniform mesh of spacing h = l/(n+l) in a square 
of side 1 in the plane with Dirichlet boundary conditions. Each i.m. 
has its own analysis, usually involving substantial mathematics. 

Here, we shall merely list some results valid for h 0 : 


1st order Richardson: 

P 

= 

1 - Ch^ 

same : variable a, : 

k 

P 

= 

1 - Ch 

Jacobi : 

P 

= 

1 - Ch^ 

Steepest descents : 

P 

= 

1 - Ch^ 

Gauss-Seidel : 

P 

= 

1 - Ch^ 

SOR : 

P 

=: 

1 - Ch. 


Proof of these results may be found in the standard references [Fori], 
[Varl], [Youl], 

The cons tant C which appears is not necessarily the same for each 
me thod , but in every case is independent of h . 

In general . if p = 1 - Ch^ then 0(h ^) iterations are required to 
compute each new decimal digit of the solution. Thus , we see that 
e.g. SOR requires only about h times the number of iterations of 
Gauss-Se ide 1 for a given accuracy for the simple test problem. But 
SOR is much more difficult to use because the parameter w must be 
determined somehow , and the number of iterations can increase 
substantial ly if it deviates even slightly from its exac t theoretical 
value . 

The preceding observation suggests that we should list some 
criteria for an i.m. to be acceptable . Basic ones are the following: 


I 
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1. High rate of convergence (i.e. small p). 

2. Performance should be essentially independent of A in (2.2). 

3. Users should not have to supply parame t e r s or 
interact with the program . 

4. A method should achieve an order of magnitude saving in storage 
and computation time over Gaussian elimination. 

1 is required in order that the iteration can be terminated with a 

reasonable guarantee that the current approximation is a good one. It 

is very difficult to decide whether a slowly convergent iteration has 

in fact ’’converged” . Concerning 2, the methods described above may 

fail for highly anisotropic or highly inhomogeneous materials. For 

example, if the square is bisected parallel to the y axis and the 

material tensor A is diag(l.l) in the left side and diag(#c,K) where 

0<#c<l on the right, then with certain boundary conditions, k appears 

in the above expressions for p as p = 1 - C#ch^ . This gives an 

unacceptably poor rate of convergence if tc < 1. Incidentally, related 

difficulties can arise even with isotropic and homogeneous problems if 

there are high mesh aspect ratios or sudden changes in the mesh 

spacing. Then x measures the mesh ratios etc. Point 3 reflects the 

fact that users are just that, and presumably do not want to become 

experts in i.m. Point 4 is the main rationale for considering i.m. in 

the first place: symmetric banded elimination for the model 2d 

2 3/2 

problem has an operation count of N /2 and requires N storage 
locations. Other demands could be made; for example, it could be 
required that i.m. impose absolutely no more constraints on the user 
than Gauss elimination, but this is not usually feasible. 

We now generalize the previous ideas to second order i.m. The 
basic formula for these is a generalization of (2.7) , 


(k+1) (k) ^ , (k-1) (k) 

u'‘ ^ ^ ^ ^ ^k^k^^ 

where and are scalar parameters, and = 1. The problem 

here is to choose the parameters to have a good convergence rate . As 
with the first order case, they may be chosen as constants (the second 
order Richardson method, due to Frankel [Fral]), as dependent on k 
( the semi-iterative me thod [Varl]) or by a variational approach 
(conjugate gradients). We shall not consider the first two cases any 


I 
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further here, because the correct choice of the parameters requires a 
knowledge of the smallest and largest eigenvalues of K - information 
which is rarely aval lable. For the details see [Hagl ] . [ Var 1 ] . 

[Youl]. Instead, consider the third case, the conjugate gradient 
method, where Wq = 1 and 


a 


k 




and 


1 


(j 


k 


= 1 


1 _ 

“k-i 



(,(k-l)_^{k-D) 


(2.16) 


(2.17) 


This choice for the 2 parameters of the basic formula (2,15) is 
obtained by imposing the 2 orthogonality conditions 

=0 3 = k. k-1 (2.18) 

on the new residual This is not the standard way of writing 

the conjugate gradient algorithm, but it is mathematically equivalent 
to it (see Appendix) and is, perhaps, conceptually simpler. The basic 
result which follows rather surprisingly from (2.15) - (2.17) is that 
r ) is orthogonal to all the previous residuals , not just to the 
two previous ones by construction. This can be easily proved by 
induction, making essential use of the symmetry of K. Thus , after at 
most N steps the residual will be zero, and the current iterate will 
be u itself. In practice N is usual ly too large for the method to be 
used in this way . What occurs frequently is that the residual 
decreases sufficiently rapidly that an acceptable approximation is 
produced after considerably fewer steps. In fact, for second order 
homogeneous and isotropic equations , the required number of iterations 
is usually closer to in two dimensions (^ in 3d). As with the 
first order i.m. the speed of convergence may slow significantly if 
the material properties deviate much from isotropy or homogene i ty . 

Applied to the standard test problem it can be proved [Hagl] that 
for conjugate gradients, the factor p = 1 - Ch. This implies the 
O(^) it erations per digit of accuracy mentioned above. Conjugate 
gradients is the easiest way to get this 1 - Ch factor without 
supplying special problem dependent iteration parameters. Each of the 
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methods so far considered either converge slower than conjugate 

gradients or need such parameters. This is perhaps the main reason 

for the popularity of the method. On the other hand, this is not very 

fast convergence. It would be more satisfactory to have p = 1 - CVh 
1/4 

say. giving 0(N ) iterations per digit, or even faster rates. 

provided they could be obtained at small cost. 

It should be mentioned that making the constant C small is 

another way to improve convergence. This seems to be much more 

difficult than adjusting the exponent of h. to judge from the very 

small number of methods which achieve it. In the conjugate gradient 

setting, such a situation occurs sometimes when there are relatively 

few distinct eigenvalues. These distinct eigenvalues may range from, 
-2 

say 0(1) to 0(h ). giving p = 1 - Ch . but if there are m of them it 

can be proved that at most m iterations will produce the exact 
solution of the linear system, apart from roundoff. In general, there 
are a large number of distinct eigenvalues so that the estimate in 
terms of h is more realistic, but transformations could be sought 
which reduce the number of distinct eigenvalues and make this estimate 
i r re 1 evant . 

Each iteration of conjugate gradients consists of forming 

mat r ix-vec tor products and dot products of vectors which are 0(N) 

operations. Since the number of digits significant in approximating 

the solution to the model differential equation is O(logioN). the 

3/2 

total work count is 0(N logioN). The storage is 0(N) because only 

2 

the nonzeros of K are needed. This is to be compared with N /2 work 
3/2 

and N storage for the solution of the model problem by direct 
me thods . 


3 . Algebraic multigrid famg) 

This technique attempts to extend the basic ideas of regular 
multigrid methods to a more general class of problems. No continuous 
problem underlies the given algebraic system of equations which is to 
be solved and. in particular, no grids are involved. The outline 
given here is based on [Rugl] which contains more details and 
references . 
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3 . 1 Multigrid principles 


Standard multigrid algorithms are based on the following 
principles: consider the residual equation (2.3) 

Ke = r. (3. 1) 

If K is a discrete partial differential operator, and r a 
corresponding source or load vector representing a suitably smooth 
function, then e can probably be well approximated by defining a 
coarser problem 


Ke = r (3.2) 

where ^ denotes a coarsening of the objects involved. For K the 

simplest example is the discrete 5 point Laplacian operator on a 

uniform mesh of side h = 1/2"* in a unit side square. If K K(h) 

^ def 

denotes this approximation, then K = K(2h). For r numerous 

^ def 

approximations can be used, of which the simplest is r = r at points 
of the coarse (2h) mesh. If (3.2) can be solved, then e should 
approximate e on the coarse mesh points. Then it can be interpolated 
(extended) to the fine mesh by a rectangular matrix E taking coarse 
vectors to fine ones, e ^ Ee , so that (2.4) gives 

^(new) ^ ^(old) ^ ^ 33 ^ 

This algorithm requires specifying (1) the coarse grid operator K (2) 
the extension operator E and (3) a smooth residual r. 

For standard finite elements there is a self evident definition 
of E, coming from using the coarse grid trial (shape) functions to 
interpolate to the fine grid. To find K we substitute the coarse 
trial functions into the energy functional and minimize in the usual 
way. It then turns out that 


K = E'KE (3.4) 

so that K can be expressed in terms of K and E. and does not have to 
be chosen independently. In a very similar way, the correct choice 
for r is 



10 


r = E^r. (3.5) 

Thus . the components of (3.2) are fully defined once E is specified. 

The construction of a smooth r is achieved by the application of 
a few iterations of a classical iterative method to an arbitrary 
starting approximation. On (i.j) meshes relaxation methods, such as 
Gauss-Seidel in its point or line versions, are frequently successful. 

In practice. (3.2) is itself reduced by smoothing and coarsening, 
and so on recursively until an easily solved coarse equation, usually 
containing just a handful of unknowns, is reached. This recursion is 
rather irre 1 evan t to the main ma thema t ical properties of mu 1 1 igr i d 
methods. Its signifi cance is practical: without it, (3.2) canno t be 
efficiently solved. 

The finite element algorithm just described was first defined and 
its ( *’W-cyc le” ) convergence analyzed in [Nid]. [Nic2] following 
earlier finite difference work. Many improvements and additions to 
this analysis have since been made. [Had] is a recent reference on 
the theoretical aspects of multigrid methods for elliptic problems. 
Here, we wish to write out the error formula for the two grid method 
in order to motivate some of the amg concepts below. Substituting 
(3.2) into (3.3) and sub trac ting both sides from the exac t solution u 
gives 


^(new) = (I _ EK"^E^K)e^°^^^ . (3.6) 

The key observation is that if 

= Et) (3.7) 

^ X 

for some 77 then since K = E KE . 

= 0 (3.8) 

i.e., we will have the exact solution on completing the coarse grid 

operation. This conclusion is independent of any particular choice of 
E. Of course, there is no reason in general to suppose that (3.7) 
will hold, but we can try to relate the smoothing to the choice of 
coarse grid so that it is nearly true. Then we can expect to get a 
rapidly convergent method. amg attempts explicitly to achieve this 
correspondence . 
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3 . 2 amg components 


In standard multigrid, the coarse meshes are assumed given and a 
smoothing algorithm must be found which enables smoothed residuals and 
errors to be adequately represented on these meshes. In the amg 
context where no meshes exist, the reverse idea is adopted^ the 
smoothing algorithm is chosen first, and based on what it achieves - 
strongly dependent on the properties of K - the coarse operators are 
constructed. This is done by using the method of section 3.1. i.e.. 

by choosing a coarse to f ine extension operator E and def ining the 
other operations in terms of it as in (3.4) - (3.5). This must be 
done recursively for several coarsenings. 

In practice, this idea is not easy to implement, and attention 
has so far been restricted to symmetric positive definite matrices K 
satisfying the additional conditions 

N 

k <0 i j. Z k ^ 0 i.j = 1.2 N. (3.9) 

ij 

Numerical results show that the method continues to converge when 
these conditions are slightly violated. 

Only the Gauss-Seidel method is used as an amg smoother in 
[Rugl]. Relaxation at a single point is given by 

where denotes summation over points already updated in this pass 

f 21 

and Z'^ ^ over those not yet modified. Smoothing occurs for strongly 

connected subsets of the unknowns , where the s treng th is measured by 
the relative magni tude of the off-diagonals |k.j|. 

Next , note that for the errors e., (3.10) gives 

e = -{1/k )( £ ). (3.11) 

(3.11) can be further simplified by ignoring the notational 
distinction between Z^^^ and Z^^^, and by setting to zero the terms 
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with only weak (i.e. |k..| is relatively small) connect ions. Let S . 

1 J i 

denote the set of strong connections of point i. Then 

2 k £ (3.12) 

j€S^ ij J 

Let C denote the set of points designated as coarse, and F the 
remainder . Interpolation to an F point is by linear combinations of 
values at C points. Let S^(E) C S^DC denote the set to be used for 

interpolation to point i € F. (3.12) now becomes 

e = -{1/k ){ Z k e + I (3.13) 

jVi 

(3.13) is a good appr oximat i on to the actual smoothing formula we are 
using. Recalling (3.7) we want to use (3.13) as the interpolation 
formula too. But the points in the second sum are not interpolation 
points for i € F. Clearly, we must choose C so that they can be 
expressed in terms of Sj,(E)U{i} values. Several ideas have been 

proposed. Good results [Rugl] have been obtained by choosing C so 
that points in the second sum in (3.13) are strongly connected to the 
set S^(E). For these points the following approximations are now 

made ^ 


£ S (k £ + Z k £ )/(k + Z k ). (3.14) 

^ ^ ^ ^ ^ £€S.(E) ^ 


Substituting (3.14) into (3.13) gives e. = 2 v..e. for i € F. 

^ 4 <TO rr\ ^ J J 

where 


j€S^(E) 


Ij 


= -(k^j . ,)/(k^^ . c..) 


(3.15) 


and 
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c..= 2 k..k„./(k..+ 

«CS^(E) 

ejii 


m€S. (E) 




)• 


(3.16) 


For i € C the interpolated value is just the coarse value existing at 
i. This defines E, and the rest of the problem setup follows 
automatically, as in section 3.1, 

It remains to give explicit rules for the construction of C and 
F. It is very important that C contains a relatively small number of 
points, because |cl is the size of the coarse problem. A low rate of 
coarsening would produce a large number of successively coarser 
levels, causing problems with storage as well as efficiency. Finding 
suitable sets C is thus crucial to the operation of the algorithm. 

The algorithm given in [Rugl] for finding C is. interestingly, 
considerably more complicated than the amg algorithm itself, involving 
much graph theoretical manipulation and empirical testing. We will 
not describe it here. Full details including flow charts and rules 
for picking parameters are in [Rugl]. 


3 . 3 Appl ications 


Some figures from [Rugl] give an idea of the behavior of the amg 
algorithm described above. Several examples are reported including 
severely anisotropic and nonhomogeneous scalar second order elliptic 
problems. The meshes were structured although of course this fact was 
not explicitly used. The performance is uniformly good on all the 
test problems, once the setup phase is complete. The latter appears 
to cost about 1-2 times as much as the solution itself, so that 
presumably a number of solutions would have to be done to neutralize 
this cost. Convergence factors are quite remarkable, being well below 
0.1 for the Poisson equation on an h = 1/64 mesh, giving an 
expectation of at least one new correct digit each iteration, and less 
than 0.12 for all the test problems. For the work to actually solve a 
model problem with a given precision, see section 6.4. 


4 . Precondl t ionlng and precondi t loners 


In this section, we will survey a class of techniques for 



14 


improving the convergence of i.m. Although these techniques can be 

used quite generally, we will consider only their application to 

conjugate gradients (eg). It has long been known that eg works best 

for matrices which have small condition numbers X /X . , the number 

max min 

of iterations required per digit of accuracy being proportional to the 
square root of the condition number. Convergence is also improved if 
there are many nearly identical eigenvalues. ’'Preconditioning” refers 
to the general strategy of transforming K to reduce its condition 
number or cluster or otherwise redistribute its eigenvalues. 


4 . 1 Using preconditioners 


Suppose we have constructed a representation of K in the form 


K = LL" + R 


(4.1) 


where L is lower triangular and R can be interpreted as the error of 

T 

the approximate factorization of K = LL = M. Then we can solve 


M~^Ku = M 


(4.2) 


instead of (2.1) with the expectation of a smaller condition number or 
of better clustered eigenvalues for M than for K. A minor point is 
that M is not symmetric, so that eg may not converge for (4.2) as 
it stands. But using the definition of M it can be transformed into 

L“^KL‘^v = L"^f V = L^u (4.3) 

where the coefficient matrix is symmetric positive definite. This 
appears slightly inconvenient in that it does not refer to the 
original variable u, but it is easy to express the iteration in terms 
of approximations u^^^ convergent directly to u. Indeed, (2.15) gives 
for (4.3) 

(k+1) (k) 

k 

fkl 

where s'* ' denotes the residual 




(4.4) 


of (4.3). 


I 
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= (s 


(k) 


g(k) )/(s(k) 


iKL-Ts(k)) 


(4.5) 


and is defined by the recursion formula corresponding to (2.17). 


Defining = L . (4.4) gives 


uO-*') = 


(4.6) 


while (4.3) gives 


= L ^(f - KL = L 


(4.7) 


SO that (4.6) can be expressed as 


, fk+1) (k) ^ , (k~l) ^ „-l (k) 


(4.8) 


Similarly, by (4.7) 


(4.9) 


and G), continues to be given by the formula (2.17). 


1_ ^ 1 °k (r^^).M~^^^>) 

“k ■ "k-1 “k-1 


(4.10) 


To summarize, (4.8) - (4.10) are the formulas for preconditioned eg 
with preconditioner M. It is clear that the only extra work consists 
of forming M r'^ ^ at each iteration. 

This formulation shows that any convenient M can be used as 

T 

preconditioner and that knowing its factored form (LL ) is 
theoretically superfluous. Some very simple preconditioners can 
exploit this,\for example diagonal preconditioning in which the 
preconditioner is D. the diagonal of K. More complicated choices of 
M, required for faster convergence, should also be cheap to form, 
store and invert. This motivates the definition of M as the product 
of incomplete factors in the next subsection. These factors preserve 
the sparsity pattern of K, so that only 0(N) operations are needed for 
the forward and back substitutions to compute M r'^ The factors 

themselves are computed, once and for all, in 0(N) operations, and 



16 


require only 0(N) storage, 

4 . 2 Incomplete factorizations 


The most important class of preconditioners is based on an idea 
known as incomplete f ac t or i za t i on . This idea was first suggested by 
Varga [Var2], although a Russian paper [Bull] the same year - 
according to some remarks in [Stol] - contains the algorithm of [Dupl] 
which itself contains a detailed implementation of Varga’s idea. 

These authors consider the application of preconditioning to first 
order i.m. for 5 or 9 point difference formulas in rectangles. The 
application to eg was suggested much later by Meijerink and Van der 
Vorst [Meil], who also give other extensions. 

We will quickly review the standard matrix formulation of 
Gaussian elimination without pivoting. Let be the ith stage 

matrix with pivot row i. and columns 1.2, . . . .i-l already 

eliminated. Then with = K. 




i = 1 .2 N-1 


(4.11) 


where is unit lower triangular, having multipliers 


-k(^)/k(^) 

J 1 11 


j = i + 1 N 


(4.12) 


in column i, and zeros in the other off diagonal positions. Then 

k("> = lC-*). . s U (4 

where U i s upper triangular . It follows by a simple direct 
calculation that 


K = LU (4.14) 

where L is unit lower triangular with the negatives of (4.12) in the 
same places below the diagonal . 

Instead of (4.11), consider the more general recursion with 

= K, 


,(1*1) , f(l),(l) . r(D 


i 


1 . 2 , . 


. ,N-1 


(4.15) 
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where eliminates column i from and called the local 

error matrix, is any N x N matrix not introducing fill and such that 

the (i + l)th step is well defined. Denoting by IT. the product 

3 ♦ ^ 

1)_ _ _ and by its inverse, it follows from (4.15) 

that 


IT K-(C^^^+7I 

N-1 , 1^ ■ ,2*^ 




( 2 ) 


+ R^^ (4. 16) 


f N ) 1 N- 1 

where ^ is upper triangular. Hence, defining ^ = IT ’ and 

^ (4.16) gives 

K = + . . . + (4.17) 

= «"tt + R (4.18) 

where R denotes the comb inat i on of the 1 ocal error ma t r ices. (4.18) 
shows that is the Gaussian decomposition of K ~ R. If the R^^^ are 

well chosen, then can approximate LU of (4.14) and be used as a 

precondi tioner , 

A significant specialization of (4.17) occurs when R^^^ is zero 
in and above its i th row, so that modifies the exact elimination 

result only below the pivot row. In this case, (4.17) becomes 

K = + R^^^ + R^^^ + . . . + (4.19) 


because 


Ijl.iR(i) _ («n)) ^ 


(fi(i))-l 


R 


(i) 


(4.20) 


and the successive inverse elimination matrices use only pivot rows 1 
through i of R^^^. Thus R is just the sum of the local error 
matrices . 

There are two more or less standard ways to choose R, both 
controlling the fill in S and In the first approach, R^^^ is 

simply defined to contain the fill from the i th step of the 
elimination. This is the method of [Meil]. It follows that wherever 
K has nonzero entries, R has zero entries and thus, since 


m = K - R 


(4.21) 
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K and agree on the nonzeros of K. In the second approach [Dupl], 
[Dup2] a modification is made to the R just defined, consisting of 
adding to the diagonal of at each stage the negative of its row 

sums, so that the resulting matrix has zero row sums. Thus, we keep 
agreement of the off diagonal nonzero terms of K with those of the 
approximate factorization, while giving up agreement of the diagonal 
terms in favor of having the row sums of K and equal. The second 
approach is usually considerably better than the first in terms of 
convergence of preconditioned eg (peg). 

In both of these methods, the zero patterns of B and match 
those of the lower and upper triangular parts of K. This suggests 
seeking B and ^ directly in this form, by multiplying them together 
and equating to K. The extra nonzero terms (analogous to fill) 

arising in this product are put into R. In a similar way, a Cholesky 

type decomposition can be obtained. This approach is usual in the 
finite difference literature and is especially helpful when there is 
an (i.j) mesh available. However, it is possible to show that the 
different approaches lead to the same approximate factors. 

The above approximate factorizations are called incomplete 
factorizations, and abbreviated as ”ILU” (for incomplete LU) or ”IC” 
(incomplete Cholesky). If the zero row sum feature is incorporated, 

the methods are usually called the modified incomplete LU (MILU) or 

Cholesky factorizations (MIC). 


4 . 3 Theoretical results 


There are two groups of results available for ILU methods. The 
first concerns the existence and stability of the approximate 
factorizations, and the second gives estimates for the condition 
numbers of the preconditioned matrices, from which the rate of 
convergence of peg can be found. In this section we will state 
representative theorems of this kind. 

It follows from (4.21) that B^ is the decomposition of K - R. In 
the ILU setting, if K is pos i t i ve def ini te then R will be symmetric 
with 0 diagonal and so indefinite. If K also has nonpositive off 
diagonal entries, then in the MILU case R will have a positive 
diagonal, negative off diagonal entries and zero row sums. Hence, it 
is positive semidef ini te . In both cases therefore, it follows that 


19 


the matrix actually factored is less positive definite than K. It is 

conceivable “ and can actually happen for merely positive definite 

matrices - that the construction of i and ^ can break down. On the 

other hand, if K - R is positive definite not only does the 

factorization exist, it is also stable. 

In [Meil] existence and relative stability are proved for 

M-matrices, i.e. . matrices K such that k. , < 0 for i 9 ^ j and K ^ > 0. 

i J 

For such matrices, it is known that Gauss factorization is well 
def ined . 

Theorem 1 [Meil] 

Let K be an M-matrix. Then the ILU factorization algorithm is 
well defined for K. Moreover the factorization is relatively stable 
in the sense that the ILU pivots are at least as large in magnitude as 
those of the exact LU process, and |^| < |l| elementwise where L is 

the exact lower triangular factor of K. 

For MILU we have another theorem. 

Theorem 2 

Let K satisfy the conditions 

k.^. <0 i 7 ^ i. k. . > 0 i.j = 1,2. . . . ,N (4.22) 


and 


k,, > - Z k (4.23) 

Then for MILU, K - R is positive definite. 

Condition (4.23) can be replaced with the weaker > condition with at 
least one strict inequality, provided each row of K contains a nonzero 
element after the diagonal [Axel]. A class of matrices satisfying 
this and also the conditions of Theorem 1 is given by linear finite 
element discretizations of the scalar problem (2.2) with Dirichlet 
boundary conditions provided all of the triangles are acute. Other 
than this, there does not seem to be any large class of symmetric 
finite element problems satisfying the conditions of either theorem. 
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But in practice both methods have been successful in cases where these 
conditions are not too strongly violated. Also, convection-diffusion 
equations discretized by upwind schemes often satisfy the conditions 
of Theorem 1 [Mei2]. 

The condition number of the preconditioned matrix M can be 
computed for model problems. Chandra proved the following result. 

Theorem 3 [Cha2] 

For the Laplacian operator -A with Dirichlet boundary conditions 
in a unit side square in IR^ . discretized by linear elements on a 
standard triangulation of side h. and preconditioned by ILU there 
exist constants and such that 

Cih"^ < cond(M~^K) ^ Cah"^. (4.24) 

From (4.24) it follows that ILU peg does not have an improved 
rate of convergence over regular eg in terms of powers of h. 
Nevertheless, it is observed in applications that ILU peg does give a 
more efficient algorithm than eg alone in the sense that a smaller 
number of iterations and a smaller amount of work are needed to solve 
with given accuracy. But as h decreases, the required number of 
iterations to compute a new digit increases as h If we have a 

method where this figure increases like some smaller power of h. then 
as h reduces, at some point the second method will become cheaper. 
Where this point actually occurs is practically unpredictable and has 
to be found experimentally. For the model problem of Theorem 3 it is 
proved (for finite differences) in [Gusl] that a slight variation of 
MILU has cond(M ^K) = 0{h ^) giving 0(h iterations per digit, 

potentially a large saving over ILU. The variation consists of adding 
a quantity of 0(h ) to the diagonal during the factorization. We will 
call this MILU+ for brevity. In practice, the same performance is 
seen for MILU. 

Theorem 4 [Gusl] 

For the problem in Theorem 3. for MILU+, constants Di , D 2 exist 
such that 

Dih"^ i cond(M~^K) ^ Dah"^. (4.25) 

In the case of an arbitrary mesh and for other generalizations 
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the 0(h ) condition number can be obtained, although once again only 

by introducing one or more problem dependent parameters. [Axel] 
contains an extensive account of these developments. 


4 . 4 Further developments 

This subsection gives several refinements and extensions of the 
basic algorithms. First, we will consider procedures which increase 
the density of nonzero elements in 2 and [Mei2] contains a 

suggestion for doing this with finite differences. The ILU procedure, 
suitable for (i.j) difference schemes, is to define a prior set P of 
entries in i where nonzeros are permitted to occur at least including 
the nonzero positions of K. In all other positions, = 0 by 

definition. ^ has the transposed nonzero positions. Then we generate 
the permitted entries in B and ^ by 


. 

1 1 


B . . 
1 J 


. 

1 J 


i-1 
= k. . - Z 
k=l 


1 1 


ik ki 

j-1 




i-1 

k. . - 2 « 'll, . . 

ij ik kj 


(4.26) 


These are the usual recurrences, but with the entries not in P 
omitted. For mesh calculations with e.g. 5 point formulas, a 
convenient choice for P consists of K*s nonzero bands together with 
some nearby ones [Mei2]. 

Clearly, this technique is not very useful for general meshes. A 

r s 1 r s 1 

more suitable method given by [Gus2] is to recursively define B^ ^ ^ 
( s) 

by defining P'‘ ^ to be the nonzero set associated with the product 
l)<^(s 1) p(^) ig defined to be P, so s = 0 corresponds to the 

original approximate factorization. In general, as s increases p^®^ 
contains more and more nonzero positions, and eventually all of them, 
so that an exact factoring would be required. [Gus2] reports that 
small values of s give the best overall results. 

Another problem which arises in solving general positive definite 
systems is that of negative pivots in ILU and MILU. Conjugate 
gradients is proved to converge only for symmetric positive definite 
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matrices. If ^ doesn’t have a positive diagonal a problem may arise. 
Two papers [Kerl], [Manl] deal with the avoidance of these negative or 
small pivots. In [Kerl] the proposed remedy is to add a sufficiently 
large number to the diagonal culprit in This just corresponds to a 

further addition into the for this step. The argument usually 

advanced is that if not too many of these corrections have to be made, 
their effect on the rate of convergence of peg will be small. Perhaps 
because of its ad hoc nature no rigorous estimate of the effect of 
this modification seems to be known. In [Manl] the prior addition of 
a matrix al to K is advocated. However. [Mei2] points out that this 
is a global change being made to correct what can be conceived as a 
local problem. Moreover, a problem dependent parameter has once again 
crept in. In spite of these objections, some good results have been 
reported in both of the above references. 


5 . Other methods 

In this section, we will very briefly mention three methods which 
have been recently proposed. Much less is known about them than the 
methods considered above but all have some new feature which may be of 
interest. They are ’’deflated eg”, the ’’element by element method”, 
and ’’domain decomposition”. In each case we can do little more than 
describe the algorithm and mention whatever else seems relevant. 


5 . 1 Deflated ct 


This method, described more fully in [Nic3], gives another way to 
improve the convergence of eg or peg. It is closer in spirit to 
multigrid methods than to incomplete factorizations, but is suitable 
for general meshes. 

Recalling '(2.15) we can generalize it to 

(k+1) (k) ^ , (k-1) ^ f W 1? (k)N rc 

where E is N x m (ra < N) and has a meaning similar to the E in section 
3. The idea behind (5.1) is that Ec^^^ ’’deflates” certain 
constituents of the residual, particularly those for which the regular 
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algorithm is ineffective. c ^ is chosen to minimize 
leading to 

e'^KEc^^^ = E^Kr^^^ (5.2) 

(k) 

which must be solved for c'^ ^ at each iteration. Since m < N this is 
feasible. From (5.1) and (5.2) we find 

- aj^Uj^K(I - EK~^E^K)r^^^ (5.3) 

where K s E^KE (c.f. (3.4)). Let P = EK"^e’^K. In (5.3) K(I - P) is 

positive semidefinite since it is symmetric and equals 

where the bracketed matrix is an orthogonal projection matrix, and 

1/2 T roi 

K is positive definite. If E r'^ = 0 then it follows by induction 

f rom (5.3) that 

E^r^^^ =0 k = 1.2 N. (5.4) 

This shows that the residual i s always orthogonal to the column space 
of E and so we can try to set up a eg iteration in the sub space 
null(E ), which presumably will converge faster than in the whole 
space for a good choice of E. To set up the iteration, all we have to 
do is pick the two coefficients in (5.3) - which are arbi trary up 
until now -to force the usual orthogonalities 

,r^^^) = 0 j = k. k-1 

from which it follows as before that 

r^^^) = 0 j = 0.1 k. 

To make E^r^^^ = 0. pick v arbitrarily, let s = f - Kv and solve 
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Kd = E^s 

for d . Then define 

= V + Ed. 

now has the required property. 

Deflation is used in a very similar way to improve a 
preconditioned eg algorithm [Nic3]. 


5 . 2 Cony e r ee nee 


Success with the deflation technique is dependent on a good 
choice of E. A basic strategy for problems with smooth coefficients 
and not too much anisotropy is to divide the domain into m disjoint 
subdomains of approximately equal areas and to associate one column of 
E with each subdomain. The j th column will be zero in every position 
except those corresponding to the unknowns in the j th subdomain, where 
it is equal to 1. (5.4) then implies that the residuals always have 

zero mean in each subdomain. In the general case, the choice of E 
depends on the properties of K. and will have to be made using the 
ideas required to pick the corresponding operator for algebraic 
mu 1 1 igr id . 

If the maximum area of the subdomains is small, then K will be of 

large order, although convergence will be rapid. If the minimum area 

is too large, then K will be a small matrix, but will not much help 

the convergence relative to eg. For the above ’’piecewise constant” 

choice of E. it is proved in [Nic3] that for second order equations of 

the type (2.2) with smooth coefficients and meshes, the error 

2 

multiplier p = 1 - Ch/d where d is the area of the largest 

subdomain. If there are severe anisotropies or discontinuities, their 

effect will show up in C and slow the convergence. E has to be chosen 

somewhat differently for these cases. For the Poisson equation with 

Dirichlet conditions and a uniform mesh [Nic3] proves that choosing 
3/5 S/5 

d = 0(h ) gives a cost of 0(N ) flops per digit. This cost is a 

slight theoretical improvement over say MILU+, for which the 

5/4 

corresponding figure is 0(N ). 

The deflation algorithm has the advantage of more general 
applicability than ILU type methods because it is based on pde theory 


25 


rather than on Gauss elimination. It would apply immediately to 
linear elasticity for example. On the other hand, good choices for E 
are problem dependent for degenerate cases. General software for 
genera t ing E in such cases could be developed, however. 

See section 6.3 for some numerical results. 


5 . 3 Element by element method 


This section contains a brief description of an algorithm 
recently proposed by Hughes et al [Hugl]. Unlike the methods of the 
previous sections, this one is specifically for finite element 
equa t i ons because it uses the fact that the stiffness matrix is a sum 
of element stiffness matrices. In [Hugl] the steady state solution of 
the ode system 


Wdu/dt = Ku - f (5.5) 

u(0) = 0 

is approximated by the iteration 

(» - = Wu*''* - 6tf (5.6) 

u(°> . 0 . 

In this section only it is assumed that K is negatlue definite. W is 
a positive definite matrix chosen to enhance the convergence to steady 
state, e.g. W = -diag(K) , and 6t is the time step. It follows from 
(5.6) after some manipulat i on that 

„(k*l) , „(k) . j^,-l/2„-l/2^(k) 

where as usual, r^^^ = f - Ku^^^ and 

- 1/2 - 1/2 -1 
V = (I - 6tW ^KW ^^^) . 

Noting for finite element systems that 

K = Z K. 
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where K. are element matrices it follows that 


V = (I - 6t2 W ^ 

i 


[Hugl] suggests the approximation 


fits ^ 

1 ' 

1 


u 

n (I 
i = l 


fitw ^ (5.7) 




Each of the inverses on the right is easy to compute, in essence 
reducing to the inversion of a matrix of order equal to that of K^. 

It is also suggested that the approximations (5.7) and 


v„ = n (I - fitw 

2 i=4 ' ^ ' 


be used alternately. The product symmetric which may be 

a desirable property in certain cases. For this, the algorithm 
becomes 


, „(k) . 5,„-l/2^^,-l/2,(k) 


(5.8) 


Of course, the convergence will be improved if partial assemblies are 

carried out, although the cost of the inversions will increase. 

Some further refinements are to use the well known BFGS update to 

- 1/2 - 1/2 

improve the "search direction" W V^W in (5.8) and to use an 

accurate line search in the improved direction. Numerical results for 
a plane strain problem are reported in [Hugl]. 

The line search/BFGS combination actually brings the iteration 
closer to a peg technique. In fact can be used as a eg 

preconditioner. This is investigated in [Noul]. 


5 . 4 Domain decomposition / substructuring 


This section describes some work which is the subject of 
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current attention [Bral], [Chal]. We can give only a general 
impression of the underlying ideas. 

’’Substructuring*' refers to the technique of ordering the 
variables of a finite element system in such a way that the 
coefficient matrix takes the form 


K 


11 


K 


1 s 


K 


22 


K 


2s 


"‘Is "L 


K , ,K , 

s-ls-l s~ls 

T 

, K 
s- 1 s ss 


(5.9) 


The diagonal blocks are usually square matrices, although this is not 
necessary [Gunl], Usually, the variables belonging to each diagonal 
block are associated with some disjoint physical subdomains, and those 
belonging to the last block column are variables associated with the 
interfaces between (and disjoint from) the subdomains. Such an 
ordering is frequently convenient on physical grounds, where the 
subdomains may represent different parts of a physical structure. 

These orderings also seem attractive for parallel processing. The 
recent interest is mostly motivated by this last factor. 

Specifically, the (block) last row of (5.9) can be slmul taneous ly 
eliminated giving a block upper triangular system with coefficient 
matrix 


where 


K 


11 


K 


22 


K 


1 s 


K 


2s 


s-ls-1 s-ls 


(5.10) 


C = K - kT K, }k, 
ss Is 11 Is 


- ''I-lsCls-A-ls 


(5.11) 


Back substitution with this matrix can be achieved with another 
simultaneous operation once the last (vector) unknown representing the 
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interface variables is found. It is the solution of the latter 
equation, say 


CUg = g (5.12) 

which is the bottleneck for the parallel implementation of this 
method. We are confronted with a familiar situation- if s is large, 
each K.. can be of small order and easy to invert. But C will be of 

large order and expensive to form and invert. If s is small. can 

be large and more difficult to invert, while C will be small, fairly 
easily formed and easily inverted. 

For parallel implementation it is mostly the first case which is 
important. However, the gains from parallelism can be lost because of 
the problems of solving (5.12) for large C. To circumvent this, 
several suggestions have recently been made for solving (5.12) 
iteratively. It can be proved without difficulty that for the type of 
problem we are considering. C is positive definite so that eg is 
naturally suggested. The residual can be simultaneously computed 
using (5.11) and the L.U. factors of K... The main problem is then to 

speed the convergence of the eg iteration. The important new point is 
that the elements of C are not explicitly given, so that 
preconditioners of the earlier sections cannot be easily used. 

Some recent work has dealt with the case of the Poisson equation 
on a uniform mesh in a rectangular domain divided horizontally by one 
or more mesh lines into subdomains. For this case. Fourier techniques 
can be used to find the eigenvalues and eigenvectors of C explicitly. 
Based on this, [Dryl] proposes the choice of preconditioner M as 


M = vT 


2 

where L = (1/h ){-l 2 -1} with suitable boundary conditions, and 
proves that M has eigenvalues independent of h. [Goll] generalizes 
this to 


M 



+ 4L. 


It is unclear whether these have any use for arbitrary substructures 
of a given domain, but it seems unlikely. On the other hand, both 
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generalize formally to the case of several horizontal strips. 

Another preconditioner which has been suggested is, for two 
subdomains , 


M 


K 


33 




13* 


[Bjol] contains a full account of the motivation for this and its use. 


6 . Numerical examples 

This section reports some numerical results for eg. ILU and MILU 
peg, deflated eg and amg . It is not intended that any method be 
selected as ’’best” from these results. Each method has its strengths 
and weaknesses which these results do not fully reveal. The idea is 
just to give some feeling for what efficiency can be expected in a few 
special cases. Also, only methods for which a fair amount of 
published data exists, or for which the authors have personal 
experience are reported. 


6 . 1 eg and peg 


The first set of numerical examples are from [Cha2] and deal with 
the Poisson equation for 2 and 3 dimensional problems with Dirichlet 
boundary conditions. The domains are the square (0,1) x (0,1) (2d) 
and the unit cube (3d), with h = 1/64 (2d) and h = 1/16 (3d). The 
initial distribution is ’’random” (distribution unknown) and the 
termination criterion is i 10 Computations were all 

done in single precision on a PDPIO. Results are shown in Tables 1 
and 2. Flop counts are approximate and setup times are not included. 

eg ILU MILU 

#itns 180 47 27 

#flops/10~® 3.8 1.5 .86 

Table 1 (2d) 

eg ILU MILU 

ttltns 47 18 21 
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#flops/10 ® .95 .49 .57 

Table 2 (3d) 

For comparison, solution of the 2d problem by a banded Cholesky 

0 

algorithm would require about 9.3 x 10^ flops. Table 1 shows that for 
the given accuracy, MILU achieves the goal of an order of magnitude 
speed improvement over the direct method. Relative to eg. Table 1 
shows that ILU and MILU respectively require about .39 and .23 of the 
computer time of the unpreconditioned algorithm. For smaller h, the 
MILU preconditioned eg algorithm would probably show larger gains 
relative to the ILU case. 

Table 2 shows a rather worse performance for MILU than ILU. 
Presumably, this can be attributed to the coarseness of the mesh. 

More evidence is required on this point. 

[Cha2] contains comparisons with other preconditioners which we 
have not discussed, either because they are sensitive to problem 
dependent parameters or depend on a special mesh structure being 
avai lable . 

A more recent set of calculations was performed by [Coni]. We 
will give some results from this paper. The square (0,1) x (0.1) with 
h = 1/51 is used for Table 3. which reports results for the 2d Poisson 
equation with Dirichlet boundary conditions. The initial 
approximation was ’’random” with entries in [-1.1] (presumably 
uniformly distributed). Iterations were terminated when 
llr^^^ll < 10 ^llr^^^ll . Double precision arithmetic on an IBM 3081 was 

used . 



eg 

ILU 

MILU 

#i tns 

109 

33 

23 

4tflops/10 

2.7 

1.3 

.92 

a 

-2.0 
Table 3 

-1.97 

o 

00 


This particular run took 1.37 epu secs. 

In the last row of Table 3, a gives the numerically computed exponent 
of h in the condition number of M ^K. The improvement in the 
condition number of MILU over ILU is essentially the theoretical one. 
namely a factor of h. On the other hand. Tables 1 and 3 show that the 
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condition number is not the whole story - the performance of ILU is 
too good to be explained this way. It is the clustering of the 
eigenvalues of M which is responsible for the good behavior of ILU. 
[Cha2] and [Coni] contain direct computations of the spectrum of M 
which support this. 

Table 3 shows that ILU and MILU need respectively about .48 and 
.34 of the standard eg computation time. These are a little worse 
than the corresponding figures for Table 1. Presumably this is 
accounted for by the different termination rules, and initial 
approximations . 

[Coni] contains two more difficult computations, one with a 
piecewise constant material tensor and another with pure Neumann 
boundary conditions. In the former case, the square with lower left 
corner at (1/4, 1/4) and upper right corner at (3/4) . 3/4) is given 

the material constant 1000 and the remainder of the original square is 
given its previous value 1. The rest of the details are as above for 
the Poisson equation. The latter example also has piecewise constant 
coefficients although not with wide variations, and a term au , which 
ensures unique solvability. o is relatively small, and piecewise 
constant. For this case only, h = 1/43. This example is due to 
[Var 1 ] , and is used in [Gus2] as well. Tables 4 and 5 show the 
results. 



ILU 

MILU 

tti tns 

47 

32 

lops/10~® 

1.9 

1.3 

est. cond. no. 46770 

40 

Table 

4 



ILU 

MILU 

#i tns 

74 

53 

#flops/10 ^ 

2.2 

1.6 

Table 

5 



The strong effect of the eigenvalue distribution is evident from Table 
4. in which the ratio of the estimated condition numbers is greater 

3 

than 10 , and yet the ratio of the number of iterations is less than 
1.5. On the other hand MILU is the more efficient algori thm in both 
cases. Table 5 shows that the third example is the most difficult. 
Probably the Neumann condition is the major reason for this. A banded 
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0 

Cholesky algorithm would require about 1.8 x 10 flops for solving the 
third problem, and Table 5 shows that MILU is nothing like the desired 
order of magni tude better than this. h is rather large for this 
example, and greater relative savings would be seen if it was 
decreased . 

[Coni] deals specifically with preconditioning of block 
tridiagonal matrices such as those arising from 5 point difference 
formulas with mesh lines parallel to the coordinate axes. Over 30 
preconditioners are compared for such problems, some of them quite 
sophisticated. It is very interesting to note that for problem 3 only 
one preconditioner was more than twice as good as MILU. for problem 2 
none were twice as good, and for the Poisson problem none were more 
than 3.3 times as good. Moreover, none of them required less storage, 
most of them needing quite a bit more, as well as more complex 
programming. This is in addition to the fact that for the majority of 
the algorithms it is not clear how to correctly apply them to general 
mesh problems. 


6 . 2 Bercovier * s examp 1 e 


In this section, we shall show some results from [Berl] of a 
calculation for which negative pivots are encountered in the ILU 
factorization but for which formal application of peg gives good 
results. According to [Berl] the results obtained this way are ’’far 
better” than those using either of the remedies in [Kerl] and [Manl]. 
The example is that of an orthotropic cantilevered beam, 10 units long 
by 1 unit deep, in plane strain. The load f is applied at the free 
end. Discretization is by bilinear elements on a uniform 10 x 3 mesh. 
Letting 1 and 2 denote the principal directions of the or tho tropic 
material, and e. ^ , £ 3 » ^12 Q-nd the corresponding strains 

and stresses, the elasticity matrix is defined by 


D 


10"^ 30 o' 

30 1 0 . 

0 0 10 


Three cases are considered, in which the angle /? between the x axis 
and direction 1 is 0°, 45°, and 90° . The negative pivots occur in the 
latter two cases. The results are given in Tables 6, 7, and 8. 
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eg ILU 

ttitns 133 7 

llrllf .00080 .00014 

Table 6 (P = 0°) 

eg ILU 

>300 14 

.00003 

Table 7 {P = 45°) 

eg ILU 

>300 19 

.0025 

Table 8 (P = 90°) 

Unfortunately. [Berl] does not give much more detail than that 
reproduced above, so it is difficult to draw specific conclusions. 
Clearly though, further investigation is warranted. 

6 . 3 Def lation 

Relatively few computations have so far been carried out with 

deflation; here we record its performance against eg on the model 

Poisson problem with Dirichlet be. The discrete equations were solved 

using n square subdomains each containing n nodes of the triangulation 

for n=9 . 16 , 25 . 36 . 49 . 64 . 81 , where n is the number of interior nodes 

2 

along cross-sections and N = n . For Table 9. the initial error was a 

smooth function, and iterations were terminated when the rms error was 
—6 

reduced below 10 of its initial value. 


n '= 

9 

16 

25 

36 

49 

64 

81 

eg itns 

25 

43 

67 

96 

130 

171 

216 

deg i tns 

17 

24 

29 

36 

41 

45 

52 

eg time 

.004 

.02 

.06 

. 18 

.45 

1 

2.0 

deg time 

.003 

.01 

.03 

.08 

. 16 

.30 

.56 


Table 9 


#i tns 
llrllf 


tns 

llrllf 


The last two lines show times relative to the eg time for n = 64. The 
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first two lines are well fitted by the formulas 2.7n and sVn 
respectively. The exponent in the second of these is accounted for by 
the theory in [Nic3], Thus, the number of iterations is rising much 
more slowly for the deflated case, as is the overall time. The times 
given include factoring times for the deflation matrix and other 
overheads . 

Comparison with MILU is not easy, but there seems to be a small 
time advantage (~20%) with deflation applied with the above choice of 
E for the model problem. The E chosen above is not the optimum one 
for this problem - slightly smaller subdomains are needed for that - 
but the difference is small. It needs to be mentioned that deflation 
requires quite a bit less work per iteration than MILU. because most 
of the operations carried out at the full mesh level are additions. 
Also, there is no problem with deflation "breaking down" on more 
general problems. But deflation requires choosing E properly and 
presently good choices are only known for a restricted range of 
prob 1 ems . 


6 . 4 amg 


An estimate of the work required to solve the model Poisson 
problem can be inferred from [Rugl]. There, for h = 1/64 a 
convergence factor p = .054 is reported, so that to reduce an initial 
error by 10 requires 

6/ 1 logi o . 054 1 ~ 5 iterations ("cycles"). 

According to [Rugl] 85 flops/mesh point/iteration are used by the amg 
algorithm, giving a work count of about 

1.7 X 10^ f lops . 

to reduce the error below 10 
Setup costs consist of 

1. Computing the interpolation weights 

2. Forming the coarse grid operators 

3. Construction of the coarse mesh sets C. 
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The third of these requires more work than the first two combined. 
[Rugl] gives a figure of about 5-9 amg iterations for 1-3. Total 
storage is about 3 times that required for storage of the problem 
itself. Thus, for the model problem at least, the setup costs are 
less than twice the solution cost. 


6 . 5 Commen t s 


As already stated, no attempt should be made to order the methods 
on the basis of these results. Nevertheless, it is worth pointing out 
where each method can be expected to do well and not so well. First, 
it seems probable from the amg and MILU results for the model problem 
that neither has any clear advantage over the other, since the factor 
of about 2 in favor of MILU could easily change with the method of 
accounting. Moreover, if h becomes smaller, there should be an 
advantage with amg. The setup time for MILU is an order of magnitude 
less than for amg and, of course, the programming is far less complex. 
Similar comments apply to deflation vis-a-vis amg. both for speed and 
storage for the model problem. 

In more difficult examples, it seems that amg deteriorates less 
rapidly than MILU. although the latter does what can be considered a 
reasonable job in most cases. Again, there may be an advantage in 
convergence speed for amg when h becomes relatively small, although 
the setup costs remain to be neutralized. For problems with more 
general geometry, and for three dimensional problems MILU seems not to 
have been adequately tested. For three dimensional problems, amg also 
has not been tested so far. 


Appendix 


In this appendix, we summarize the properties of the classical 
conjugate gradient algorithm, and obtain the second order form used in 
the paper. For solving 


Ku = f 

where K is symmetric positive definite and N x N, the algorithm is 
thi s : 
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( 0 ) . ( 0 ) r V ( 0 ) ( 0 ) ( 0 ) 

u'^ ^ IS arbitrary. r ' = f - Ku'^ p'^ ^ ^ 

„(k*n , „(k) , ,^p(k) k,o.i. . . . 

p(k+l) 3, k=0.1. . . . 

The coefficients are given by 


where 


'k 




(\r) (1^ ) 

The direction uectors p'^ ^ and the residuals r'^ ^ satisfy the 

following orthogonality relations^ 

(1) (p^*^^Kp^^^) =0 j = 0.1. . . . .k-1 

(2) =0 j = 0.1 k-1. (A2) 

From (A2) it follows that = 0. Some earlier residual can be zero 

in certain cases, but we do not need to worry about this here. From 
[Luel] we have the following bound for the ’’energy” 

of the kth error*- 


Theorem 


£{4'“)) i 4p2‘^E(4<°)) 

where p = { 1 - \/!c)/(l + Vic) and k = X . /X 

^ ^ ^ ^ ' min max 

2 

For the model Poisson problem, the eigenvalue ratio k = 0{h ), 
giving p = 1 - Ch as mentioned in section 2. 

To obtain the second order form of the algorithm, apparently used 
first in [Rutl] (see also [Con2]), eliminate from (Al) using 

p<“> = 
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and 


(k-1) (k) (k-1) 

k-1*^ 


The result is 


(k+1) (k) ^ , (k“l) ^ (k) 

u'- * - ^k^^ °^k^k^ 


(A3) 


where = !• = “"'k^k- l^"^k-l “k*^k ~ ”*^k’ define P_^ = 0 

for consistency with (Al). From (A3) it follows that 




(A4) 


( kl 

Since the coefficients in (A4) involve the vectors , we shall 

determine them afresh, directly from (A4). To do this, note that the 
property (A2) implies in particular that 


,(k), . ,/k^l) ,(k-l)j ^ 0 


In conjunction with (A4) the first implies 




(A5) 


Similarly, the second implies 

0 = ^ . r ' ) 

^ ,, (k“l) (k-l)^ 

0 = (j^(r'^ ^.r'‘ ^ ) 


- “k“k^^ 






by the symmetry of K. This last term can be rewritten using (A4) with 

( k) 

k:= k-1 and taking its inner product with r , Then 

0 = „^.(,(k-l).,(k-l), , ,(k), 


from which it follows that 


^k (rt^).rW) 


k ^ "k-1 “k-1 (r^^ ) 


(A6) 
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(A3) with (A5) - (A6) and the initial contition (Jq = 1 now define the 
second order form of the algorithm. 

It can be proved by induction that the property (A2) is preserved 
for (A3), (A5) - (A6) with Dq = 1. There remains the question of 
whether the u'^ ^ of the standard form and u'‘ ^ of the second order 
form are indeed equal. In fact, all of the iterates of the two 
methods are identical (for exact arithmetic). This can also be proved 
by induction, although the calculations are longer than for the proof 
of (A2). 
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